FBP: A Frontier-Based Tree-Pruning Algorithm

نویسندگان

  • Xiaoming Huo
  • Seoung Bum Kim
  • Kwok-Leung Tsui
  • Shuchun Wang
چکیده

A frontier-based tree-pruning algorithm (FBP) is proposed. The new method has comparable order of computational complexity, comparing with Cost-Complexity Pruning (CCP). Regarding tree pruning, it provides a full spectrum of information; specifically, (1) given the value of the penalization parameter λ, it gives the minimum size of a decision tree; (2) given the size of a decision tree, it provides the range of the penalization parameter λ, within which the complexity-penalization approach will render such a tree size; (3) it finds the sizes of trees that are inadmissible — no matter what the value of the penalty parameter is, the resulting tree based on a complexity-penalization framework will never have these sizes. Simulations on real datasets reveals a ‘surprise’: in the complexity-penalization approach, most of the tree sizes are inadmissible. FBP facilitates a more faithful implementation of the principle: Cross-Validation (CV). Simulations seem to favor such an approach. Utilizing FBP, a stability analysis for CV is proposed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Evaluation of liquefaction potential based on CPT results using C4.5 decision tree

The prediction of liquefaction potential of soil due to an earthquake is an essential task in Civil Engineering. The decision tree is a tree structure consisting of internal and terminal nodes which process the data to ultimately yield a classification. C4.5 is a known algorithm widely used to design decision trees. In this algorithm, a pruning process is carried out to solve the problem of the...

متن کامل

Experiments with an innovative tree pruning algorithm

The pruning phase is one of the necessary steps in decision tree induction. Existing pruning algorithms tend to have some or all of the following difficulties: 1) lack of theoretical support; 2) high computational complexity; 3) dependence on validation; 4) complicated implementation. The 2-norm pruning algorithm proposed here addresses all of the above difficulties. This paper demonstrates the...

متن کامل

A Missing Link in Root-to-Frontier Tree Pattern Mat hing

Abstra t. Tree pattern mat hing (tpm) algorithms play an important role in pra ti al appli ations su h as ompilers and XML do ument validation. Many tpm algorithms based on tree automata have appeared in the literature. For reasons of eÆ ien y, these automata are preferably deterministi . Deterministi root-to-frontier tree automata (drftas) are less powerful than nondeterministi ones, and no ro...

متن کامل

A missing link in root-to-frontier tree pattern matching

Tree pattern matching (tpm) algorithms play an important role in practical applications such as compilers and XML document validation. Many tpm algorithms based on tree automata have appeared in the literature. For reasons of efficiency, these automata are preferably deterministic. Deterministic root-to-frontier tree automata (drftas) are less powerful than nondeterministic ones, and no root-to...

متن کامل

CC4.5: cost-sensitive decision tree pruning

There are many methods to prune decision trees, but the idea of cost-sensitive pruning has received much less investigation even though additional flexibility and increased performance can be obtained from this method. In this paper, we introduce a cost-sensitive decision tree pruning algorithm called CC4.5 based on the C4.5 algorithm. This algorithm uses the same method as C4.5 to construct th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • INFORMS Journal on Computing

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2006